Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments
نویسندگان
چکیده
This paper addresses the problem of skyline computation under the MapReduce framework. As a parallel programming model for data-intensive computing applications, MapReduce runs on a cluster of commercial PCs with the main idea of task decomposition and result reduction. Based on different data partitioning strategies, three MapReduce style skyline computation algorithms are developed: MapReduce based BNL (MR–BNL), MapReduce based SFS (MR–SFS) and MapReduce based Bitmap (MR–Bitmap). Extensive experiments are conducted to evaluate and compare the three algorithms under different settings of data distribution, dimensionality, buffer size and cluster size.
منابع مشابه
Efficient Skyline Computation in MapReduce
Skyline queries are useful for finding interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of back-ends are switching from single-node environments to non-conventional paradigms like MapReduce. Despite the usefulness of skyline queries, existing works on skyline computation in MapReduce do not take full a...
متن کاملParallel Computation of Skyline and Reverse Skyline Queries Using MapReduce
The skyline operator and its variants such as dynamic skyline and reverse skyline operators have attracted considerable attention recently due to their broad applications. However, computations of such operators are challenging today since there is an increasing trend of applications to deal with big data. For such data-intensive applications, the MapReduce framework has been widely used recent...
متن کاملEfficient Skyline Computation for Large Volume Data in MapReduce Utilising Multiple Reducers
A skyline query is useful for extracting a complete set of interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of backends are switching from single node environments to cluster oriented setups. Previous work has presented ways to run the skyline query in these setups using the MapReduce framework, but th...
متن کاملSimultaneous Processing of Multi-Skyline Queries with MapReduce
With rapid increase of the number of applications as well as the sizes of data, multi-query processing on the MapReduce framework has gained much attention. Meanwhile, there have been much interest in skyline query processing due to its power of multi-criteria decision making and analysis. Recently, there have been attempts to optimize multi-query processing in MapReduce. However, they are not ...
متن کاملProcessing of Probabilistic Skyline Queries Using MapReduce
There has been an increased growth in a number of applications that naturally generate large volumes of uncertain data. By the advent of such applications, the support of advanced analysis queries such as the skyline and its variant operators for big uncertain data has become important. In this paper, we propose the effective parallel algorithms using MapReduce to process the probabilistic skyl...
متن کامل